Scientific Python antipatterns advent calendar day eight

For today, a simple string formatting tool that makes code a lot cleaner. As a reminder, I’ll post one tiny example per day with the intention that they should only take a couple of minutes to read.

If you want to read them all but can’t be bothered checking this website each day, sign up for the mailing list:

Sign up for the mailing list

and I’ll send a single email at the end with links to them all.

Using string concatenation rather than f-strings

An extrememly common task in scientific programming is to construct a string that includes variables, either for printed output or to generate a file with a particular format. For simple cases, concatenation works fine:

sample_id = 'abc123'

# some more code

print('Processed sample ' + sample_id)
Processed sample abc123

as long as we remember to put the space at the end of the first string.

Things get more complicated as soon as we want to include variables that are not strings:

sample_id = 'abc123'
sample_size = 1234

# some more code

print('Processed sample ' + sample_id + ' with size ' + str(sample_size))
Processed sample abc123 with size 1234

as we will get a TypeError if we forget to call str.

The code gets even more complicated once we decide that we need to include a bit of processing:

sample_id = 'abc123'
sample_size = 1234
fruits_found = ['apple', 'banana', 'strawberry']

# splitting the print over multiple lines now 
print(
    'Processed sample ' + sample_id + ' with size ' 
    + str(sample_size)+ ' found fruits ' 
    + ','.join(fruits_found)
)
Processed sample abc123 with size 1234 found fruits apple,banana,strawberry

We could make the print a bit cleaner by turning some of the components into separate variables:

# preprocess variables
size_string = str(sample_size)
fruits_list = ','.join(fruits_found)

print(
    'Processed sample ' + sample_id + ' with size ' 
    + size_string + ' found fruits ' + fruits_list
)
Processed sample abc123 with size 1234 found fruits apple,banana,strawberry

but this clutters the code, and introduces the chance that we will accidentally use the wrong variable later, resulting in an error:

# calculate fruits per sample size
len(fruits_found) / size_string
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[5], line 2
      1 # calculate fruits per sample size
----> 2 len(fruits_found) / size_string

TypeError: unsupported operand type(s) for /: 'int' and 'str'

or, even worse, an incorrect result:

# accidentally counting the number of characters
# rather than the number of elements
len(fruits_list) / sample_size
0.018638573743922204

A really nice solution to this kind of variable interpolation is f-strings. If we put a single letter f before the opening quote, anything in our string that is surrounded by curly brackets will be interpreted as code. This works for variables:

f'Processed sample {sample_id}'
'Processed sample abc123'

and, helpfully, the result is always converted to a string automatically, no need to call the str function explicitly:

f'Processed sample {sample_id} with size {sample_size}'
'Processed sample abc123 with size 1234'

It also works for arbitrary code:

f'Processed sample {sample_id} with size {sample_size} found fruits {','.join(fruits_found)}'
'Processed sample abc123 with size 1234 found fruits apple,banana,strawberry'

If we want to have our string on multiple lines, we must end each line with a backslash:

f'Processed sample {sample_id}  \
with size {sample_size} \
found fruits {','.join(fruits_found)}'
'Processed sample abc123  with size 1234 found fruits apple,banana,strawberry'

A really nice feature of f-strings is that if we want to see the raw string, without the code interpolation, we can just temporarily remove the f from the start:

'Processed sample {sample_id}  \
with size {sample_size} \
found fruits {','.join(fruits_found)}'
('Processed sample {sample_id}  with size {sample_size} found fruits {',
 '.join(fruits_found)}')

and we will see what Python is seeing.

Bonus: for numbers with many decimal places:

# calculate fruits per sample size
fruit_density = len(fruits_found) / sample_size
fruit_density
0.0024311183144246355

we can include special characters in our f-string to round the variable to a given number of decimal places:

# round the fruit density to 5 decimal places
f'Processed {sample_id} - fruit density {fruit_density:.5f}'
'Processed abc123 - fruit density 0.00243'

One more time; if you want to see the rest of these little write-ups, sign up for the mailing list:

Sign up for the mailing list